% Copyright 2019 by Till Tantau
%
% This file may be distributed and/or modified
%
% 1. under the LaTeX Project Public License and/or
% 2. under the GNU Free Documentation License.
%
% See the file doc/generic/pgf/licenses/LICENSE for more details.


% \section{Introduction to Data Visualization}
\section{数据可视化简介}

% \emph{Data visualization} is the process of converting \emph{data points,} which typically consist of multiple numerical values, into a graphical representation. Examples include the well-known function plots, but pie charts, bar diagrams, box plots, or vector fields are also examples of data visualizations.

\emph{数据可视化}是将通常由多个数值组成的\emph{数据点}转换为用图形表示的过程。包括众所周知的函数图形，饼图、条形图、箱形图和矢量图也可看成是数据可视化的示例。

% The data visualization subsystem of \pgfname\ takes a general, open approach to data visualization. Like everything else in \pgfname, there is a powerful, but not-so-easy-to-use basic layer in the data visualization system and a less flexible, but much simpler-to-use frontend layer. The present section gives an overview of the basic ideas behind the data visualization system.

\pgfname\ 的数据可视化子系统采用一种通用的开放式方法来进行数据可视化。 像\pgfname 中的所有其他内容一样，数据可视化系统中有一个功能强大但不太容易使用的基本层，以及灵活性较差但使用起来却更简单的前端层。 本节概述了数据可视化系统背后的基本思想。


% \subsection{Concept: Data Points}
\subsection{概念：数据点}
\label{section-dv-intro-data-points}

% The most important input for a data visualization is always raw data. This data is typically present in different formats and the data visualization subsystem provides methods for reading such formats and also for defining new input formats. However, independently of the input format, we may ask what kind of data the data visualization subsystem should be able to process. For two-dimensional plots we need lists of pairs of real numbers. For a bar plot we usually need a list of numbers, possibly together with some colors and labels. For a surface plot we need a matrix of triples of real numbers. For a vector field we need even more complex data.

数据可视化的最重要的输入始终是原始数据。 该数据通常以不同的格式存在，并且数据可视化子系统提供了读取此类格式以及定义新输入格式的方法。 但是，与输入格式无关，我们可能会问数据可视化子系统应该能够处理哪种数据。 对于二维图，我们需要成对的实数列表。 对于条形图，我们通常需要一个实数列表，可能还需要一些颜色和标签。 对于表面图，我们需要一个由实数组成的三维矩阵。 对于矢量场，我们需要更复杂的数据。

% The data visualization subsystem makes no assumption concerning which kind of data is being processed. Instead, the whole ``rendering pipeline'' is centered around a concept called the \emph{data point}. Conceptually, a data point is an arbitrarily complex record that represents one piece of data that should be visualized. Data points are \emph{not} just coordinates in the plane or the numerical values that need to be visualized. Rather, they represent the basic units of the data that needs to be visualized.

数据可视化子系统不假设正在处理哪种类型的数据。 取而代之的是，整个``渲染 管道''都围绕一个称为\emph{数据点}的概念。 从概念上讲，数据点是任意复杂的记录，代表应可视化的一条数据。 数据点\emph{不仅是}平面中的坐标或需要可视化的数值。 相反，它们代表需要可视化的数据的基本单位。

% Consider the following example: In an experiment we drive a car along a road and have different measurement instruments installed. We measure the position of the car, the time, the speed, the direction the car is heading, the acceleration, and perhaps some further values. A data point would consist of a record consisting of a timestamp together with the current position of the car (presumably two or three numbers), the speed vector (another two or three numbers), the acceleration (another two or three numbers), and perhaps the label text of the current experiment.

请考虑以下示例：在一个实验中，我们沿着道路驾驶汽车并安装了不同的测量仪器。 我们测量汽车的位置，时间，速度，汽车前进的方向，加速度以及可能的其他值。 数据点将由一个记录组成，该记录包括一个时间戳记以及汽车的当前位置（大概两个或三个数字），速度矢量（另一个两个或三个数字），加速度（另一个两个或三个数字）以及也许是当前实验的标签文字。

% Data points should be ``information rich''. They might even contain more information than what will actually be visualized. It is the job of the rendering pipeline to pick out the information relevant to one particular data visualization -- another visualization of the same data might pick different aspects of the data points, thereby hopefully allowing new insights into the data.

数据点应该是``信息丰富''的。 它们甚至可能包含比实际的可视化图形更多的信息。 呈现 管道的工作是挑选与一种特定数据可视化图形相关的信息——同一数据的另一种可视化图形可能会选择数据点的不同方面，从而有望允许对数据进行一种新的洞察。

% Technically, there is no special data structure for data points. Rather, when a special macro called |\pgfdatapoint| is called, the ``totality'' of all currently set keys with the |/data point/| prefix in the current scope forms the data point. This is both a very general approach and quite fast since no extra data structures need to be created.

从技术上讲，数据点没有特殊的数据结构。 而是当调用名为 |\pgfdatapoint| 的特殊宏时，当前作用域中所有的设置键带有 |/data point/| 前缀的``全体''构成数据点。 这既是一种非常通用的方法，又非常快，因为不需要创建额外的数据结构。


% \subsection{Concept: Visualization Pipeline}
\subsection{概念：可视化 管道}

% The \emph{visualization pipeline} is a series of actions that are performed on the to-be-visualized data. The data is presented to the visualization pipeline in the form of a long stream of  complex data points. The visualization pipeline makes several passes over this stream of data points. During the first pass(es), called the \emph{survey phase(s)}, information is gathered about the data points such as minimal and maximal values, which can be useful for automatic fitting of the data into a given area. In the main pass over the data, called the \emph{visualization phase}, the data points are actually visualized, for instance in the form of lines or points.

\emph{可视化 管道}是对要可视化的数据执行的一系列操作。 数据以一长串复杂数据点的形式呈现给可视化 管道。 可视化 管道对该数据点流进行了多次传递。 在被称为\emph{调查阶段}的在第一阶段期间，将收集有关数据点的信息，例如最小值和最大值，这对于将数据自动拟合到给定区域很有用。 在数据的主要传递过程中，称为\emph{可视化阶段}，实际上以线或点的形式可视化数据点。

% Like as for data points, the visualized pipeline makes no assumptions concerning what kind of visualization is desired. Indeed, one could even use it to produce a plain-text table. This flexibility is achieved by extensive use of objects and signals: When a data visualization starts, a number of signals (see Section~\ref{section-signals} for an introduction to signals) are initialized. Then, numerous ``visualization objects'' are created that listen to these signals. These objects are all involved in processing the data points. For instance, the job of an |interval mapper| object is to map one attribute of a data point, such as a car's velocity, to another, such as the $y$-axis of a plot. For each data point the different signals are raised in a certain order and the different visualization objects now have a chance of preparing the data point for the actual visualization. Continuing the above example, there might be a second |interval mapper| that takes the computed $y$-position and applies a logarithm to it, because a log-plot was requested. Then another mapper, this time a |polar mapper| might be used to map everything to polar coordinates. Following this, a |plot mark visualizer| might actually draw something at the computed position.

像数据点一样，可视化 管道不对所需的可视化做出任何假设。 确实，甚至可以使用它来生成纯文本表。 通过大量使用对象和信号来实现这种灵活性：数据可视化开始时，会初始化许多信号（有关信号的介绍，请参见\ref{section-signals}节）。 然后，创建了许多监听这些信号的``可视化对象''。 这些对象都涉及数据点的处理。 例如，|interval mapper| 对象的工作是将数据点的一个属性（例如汽车的速度）映射到另一个属性，例如图表的$y$轴。 对于每个数据点，按一定顺序发出不同的信号，并且现在不同的可视化对象都有机会为实际的可视化准备数据点。 继续上面的示例，可能还有第二个 |interval mapper|，它采用计算出的$y$位置并求其对数，因为请求了对数图。 然后是另一个映射器，这次是 |polar mapper|，可用于将所有数据点映射到极坐标。 此后，|plot mark visualizer| 可在实际上计算出的位置绘制一些东西。

% The whole idea behind the rendering pipeline is that new kinds of data visualizations can be implemented, ideally, just by adding one or two new objects to the visualization pipeline. Furthermore, different kinds of plots can be combined in novel ways in this manner, which is usually very hard to do. For instance, the visualization pipeline makes it easy to create, say, polar-semilog-box-plots. At first sight, such new kinds of plots may seem frivolous, but data visualization is all about gaining insights into the data from as many different angles as possible.

呈现 管道背后的整个想法是，理想地，只需在可视化管道中添加一个或两个新对象，就可以实现新型的数据可视化。 此此外，不同类型的图形可以通过这种方式以新颖的方式结合起来，这通常是很难做到的。 例如，可视化 管道可以轻松创建极坐标半对数箱形图。 乍一看，这种新的图可能看起来很无聊，但数据可视化就是要从尽可能多的不同角度来洞察数据。

% Naturally, creating new classes and objects for the rendering pipeline is not trivial, so most users will just use the existing classes, which should, thus, be as flexible as possible. But even when one only intends to use existing classes, it is still tricky to setup the pipeline correctly since the ordering is obviously important and since things like axes and ticks need to be configured and taken care of. For this reason, the frontend libraries provide preconfigured rendering pipelines so that one can simply say that a data visualization should look like a |line plot| with |school book axes| or with |scientific axes|, which selects a certain visualization pipeline that is appropriate for this kind of plot:

自然，为呈现管道创建新的类和对象并不是一件简单的事情，因此大多数用户将只使用现有的类，这样应该尽可能灵活。但是，即使只打算使用现有的类，正确地设置管道仍然需要技巧，因为顺序显然很重要，而且轴和刻度之类的东西需要配置和处理。因此，前端库提供了预先配置的渲染 管道，这样我们就可以简单地说，一个数据可视化应该看起来像一个带有 |school book axes| 或 |scientific axes| 的 |line plot|，它选择一个特定的可视化管道，适合这类图：

%
\begin{codeexample}[preamble={\usetikzlibrary{datavisualization.formats.functions}}]
\begin{tikzpicture}[scale=.7]
  \datavisualization [school book axes, visualize as smooth line]
  data [format=function] {
    var x : interval [-2:2];
    func y = \value x*\value x + 1;
  };
\end{tikzpicture}
\end{codeexample}
%
\begin{codeexample}[preamble={\usetikzlibrary{datavisualization.formats.functions}}]
\begin{tikzpicture}[scale=.7]
  \datavisualization [scientific axes, visualize as smooth line]
  data [format=function] {
    var x : interval [-2:2];
    func y = \value x*\value x + 1;
  };
\end{tikzpicture}
\end{codeexample}
%
% One must still configure such a plot (choose styles and themes and also specify which attributes of a data point should be used), but on the whole the plot is quite simple to specify. 

我们仍然必须配置这样的绘图（选择样式和主题，并指定应使用数据点的哪些属性），但总体而言，绘图的指定相当简单。



%%% Local Variables:
%%% mode: latex
%%% TeX-master: "pgfmanual"
%%% End: